Memory controller and method for optimized read/modify/write performance

ABSTRACT

A memory controller optimizes execution of a read/modify/write command by breaking the RMW command into separate and unique read and write commands that do not need to be executed together, but just need to be executed in the proper sequence. The most preferred embodiments use a separate RMW queue in the controller in conjunction with the read queue and write queue. In other embodiments, the controller places the read and write portions of the RMW into the read and write queue, but where the write queue has a dependency indicator associated with the RMW write command in the write queue to insure the controller maintains the proper execution sequence. The embodiments allow the memory controller to translate RMW commands into read and write commands with the proper sequence of execution to preserve data coherency.

CROSS-REFERENCE TO PARENT APPLICATIONS

This patent application is a continuation of Ser. No. 11/779,277 filedon Jul. 18, 2007 which is a continuation of U.S. Pat. No. 7,328,317.Both of these parent applications are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Technical Field

This invention generally relates to computer memory systems, and morespecifically relates to optimizing read/modify/write control in acomputer memory system.

2. Background Art

Since the dawn of the computer age, computer systems have evolved intoextremely sophisticated devices that may be found in many differentsettings. Computer systems typically include a combination of hardware(e.g., semiconductors, circuit boards, etc.) and software (e.g.,computer programs). One key component in any computer system is memory.

Modern computer systems typically include dynamic random-access memory(DRAM). DRAM is different than static RAM in that its contents must becontinually refreshed to avoid losing data. A static RAM, in contrast,maintains its contents as long as power is present without the need torefresh the memory. This maintenance of memory in a static RAM comes atthe expense of additional transistors for each memory cell that are notrequired in a DRAM cell. For this reason, DRAMs typically have densitiessignificantly greater than static RAMs, thereby providing a much greateramount of memory at a lower cost than is possible using static RAM.

However, DRAMs are also more prone to errors in the data read from thememory. Sophisticated error correction circuitry has been developed thatallow detecting errors in a DRAM. During a typical read cycle, a cacheline is read, causing a corresponding read of an error correction code(ECC) from memory. The error correction circuitry uses the ECC to detectif there are errors in the data within the ECC boundary. The ECCboundary is the amount of data or size of the chunk of memory used togenerated the ECC (such as a cache line). When data is written to memorythe error correction circuitry generates the ECC, which is then writtento the cacheline with the data, and then into the memory.

Modern DRAM memory controllers support a memory command known asRead/Modify/Write (RMW). A RMW command is used to write less data than afull cache line. Before the write operation, the full cache line of datamust be read to be combined with the new data of the RMW command. Thisis necessary to assure data integrity in the memory and so that a newerror correction code can be generated for the store. In the prior art,once the RMW cycle starts, the entire RMW sequence is performed as anatomic operation to assure data integrity. If processor reads occur justafter the read operation of the RMW cycle, the processor reads have towait until the atomic RMW operation is completed. As a result, prior artmemory controllers negatively affect system performance when performingRead/Modify/Write operations due to excessive time spent processing RMWoperations. Without a way for performing Read/Modify/Write operations ina way that does not make processor read cycles wait, the computerindustry will continue to be plagued with decreased performance duringRead/Modify/Write cycles.

DISCLOSURE OF INVENTION

A memory controller optimizes execution of read/modify/write (RMW)commands by breaking the RMW commands into separate and unique read andwrite commands that do not need to be executed together, but just in theproper sequence. Some embodiments use a separate RMW queue in thecontroller in conjunction with the read queue and write queue. In otherembodiments, the controller places the read and write portions of theRMW into the read and write queue, but where the write queue has adependency indicator associated with the RMW write command in the writequeue to insure the controller maintains the proper execution sequence.The embodiments allow the memory controller to translate RMW commandsinto read and write commands with the proper sequence of execution topreserve data coherency.

The foregoing and other features and advantages of the invention will beapparent from the following more particular description of preferredembodiments of the invention, as illustrated in the accompanyingdrawings.

BRIEF DESCRIPTION OF DRAWINGS

The preferred embodiments of the present invention will hereinafter bedescribed in conjunction with the appended drawings, where likedesignations denote like elements, and:

FIG. 1 is a block diagram of a memory controller in accordance with thepreferred embodiments;

FIG. 2 is a sample timing diagram showing the function of the memorycontroller of FIG. 1;

FIG. 3 is a flow diagram of a method for processing RMW operations inaccordance with the preferred embodiments;

FIG. 4 is a flow diagram of a method for processing RMW operations inaccordance with the preferred embodiments;

FIG. 5 is a flow diagram of a method for processing RMW operations inaccordance with the preferred embodiments;

FIG. 6 is another block diagram of a memory controller in accordancewith the preferred embodiments;

FIG. 7 is a flow diagram of a method for processing RMW operations inaccordance with the preferred embodiments for the memory controller inFIG. 6;

FIG. 8 is a flow diagram of a method for processing RMW operations inaccordance with the preferred embodiments for the memory controller inFIG. 6;

FIG. 9 is a flow diagram of a method for processing RMW operations inaccordance with the preferred embodiments for the memory controller inFIG. 6;

FIG. 10 is a block diagram of a prior art memory controller; and

FIG. 11 is a sample timing diagram showing the function of the prior artmemory controller of FIG. 10.

BEST MODE FOR CARRYING OUT THE INVENTION

A prior art memory controller and method are first presented herein toprovide a context for the discussion of the preferred embodiments.

Referring to FIG. 10, a memory controller 1000 in accordance with theprior art includes a read queue 1020, a write queue 1030, and commandformatting logic 1040. A read command 1050 from a processor may bewritten to the read queue 1020. The read queue 1020 includes a pluralityof entries that are processed by the memory controller 1000. A writecommand 1060 from the processor may be written to the write queue 1030.The write queue 1030 includes a plurality of entries that are processedby the memory controller 1000. RMW commands 1065 from the processor arealso written to the write queue 1030. In the memory controller 1000 readoperations may have priority over write operations. RMW commands 1065are serviced by processing the read portion of the command from thewrite queue and then holding the write portion of the command until theread is completed. The command formatting logic 1040 presentsappropriate commands to the memory via the memory command interface1070.

The “read/modify/write” (RMW) operation presents unique problems to thememory controller 1000. The RMW operation is so designated due to itsatomic operation. Atomic operation means that once the RMW operation iscommenced, all other accesses to the memory are delayed until the RMWoperation is complete. The RMW operation is used for systems havingerror correction or systems without error correction that don't havepartial write capability. In some systems the RMWs are simply storesthat are less than a full cacheline in size, so the full cache line ofdata must be read before being combined with the RMW data and thenwritten back into memory. By delaying processor accesses that occurduring the atomic RMW cycle, each subsequent processor access suffersthe delay time that resulted from waiting for the RMW cycle to complete.The result is a decrease in system performance caused by this delay.

The delay in prior art RMW cycles is illustrated by a simplified timingdiagram shown in FIG. 11. The activity on the memory controller 1000 isshown under the heading “Memory Bus Operation” compared with the timingof a “Memory Bus Clock.”A first RMW cycle is designated as RMW0. TheRMW0 cycle has a read command 1110 and a write command 1120. The timebetween the read command 1110 and a write command 1120 is a RMW timedelay 1130. In the prior art memory controllers, the time delay 1130 wasunproductive, since the memory controller 100 had to delay other memoryaccess commands until the RMW command was completed. This time delay1130 can significantly reduce memory bandwidth in a data stream thatcontains a large number of RMW commands.

The preferred embodiments translate the formerly atomicread/modify/write operation into separate read and write operationsusing an architecture and protocol that assures that processor readcycles are not delayed while the RMW cycles are in progress. Referringto FIG. 1, a memory controller 100 in accordance with the preferredembodiments includes a read queue 120, a write queue 130, a RMW queue135 and command formatting logic 140. A read command 150 from aprocessor may be written to the read queue 120. The read queue 120includes a plurality of entries that are processed by the memorycontroller 100. A write command 160 from the processor may be written tothe write queue 130. The write queue 130 includes a plurality of entriesthat are processed by the memory controller 100. A RMW command 165 fromthe processor may be written to the RMW queue 135. The RMW queue 135includes a plurality of entries that are processed by the memorycontroller 100.

In the memory controller 100 of the preferred embodiments, readoperations may have priority over write operations (similar to the priorart), so the read queue 120 is serviced until all its entries have beenprocessed, at which time one or more entries in the write queue 130 maybe processed. Since the memory controller 100 in the preferredembodiments can distinguish a RMW read over a processor read, the memorycontroller 100 can also give priority to processor reads over RMW reads.RMW commands can be processed sequentially, in groups or upon a certainthreshold as described below. The command formatting logic 140 presentsappropriate commands to the memory via the memory command interface 170.

The memory controller 100 in FIG. 1 processes incoming commands from theprocessor by identifying the type of command (read, write or RMW) andplacing them in the appropriate queue. The memory controller 100 thenexecutes the commands in the queues. The read queue 120 may be givenpriority. Commands on the read queue 120 and the write queue 130 areexecuted from the respective queue in a manner known in the prior artexcept where described differently herein. Execution of commands on theRMW queue are accomplished by translating them and placing them on theread and write queues as described below. This embodiment with a RMWqueue takes much of the complexity out of the write queue 130 comparedto prior art architectures for handling RMW commands within the writequeue. The embodiment also simplifies the complexity of commands to beexecuted by the memory controller. A RMW queue that does not executecommands directly simplifies the command execution for the memorycontroller. This includes optimization of command order within the queueand switching between command in the read and write queues.

Commands in the RMW queue 135 are translated into separate read andwrite operations. The RMW commands are not executed out of the RMW queue135. The memory controller 100 first writes the read portion of the RMWcommand in the RMW queue 135 to the read queue 120 as shown by arrow 142in FIG. 1. The memory controller 100 then waits for data from the readcommand to be returned from the read portion of the RMW command that wasplaced on the read queue 120 and executed from the read queue. Theportion placed on the read queue 120 is processed and executed from theread queue 120 as is known in the prior art. The memory controller 100then combines data returned from the read command (represented by arrow144) with the partial RMW data of the original RMW command (representedby arrow 146) into a single write command and places the write commandon the write queue 130. The combining or merging of the data is done ina register or in the data queues (not shown) that are associated withthe command queues. The associated data queues are known in the priorart and are not shown for simplicity.

In preferred embodiments, command processing in the RMW queue isdeferred to achieve various advantages. Rather than process a single RMWcommand, the memory controller 100 may defer the processing of the RMWcommand until meeting certain conditions or until there is a certainnumber of commands in the queue. The deferring of commands allows foroptimization and clustering as described further below. The memorycontroller 100 may defer based on a low water mark, a high water mark, afull indicator and/or a timer.

The architecture of the most preferred embodiments facilitate the use ofcommand clustering and optimization. Command clustering is where thememory controller 100 gathers disparate write and read commands andcombines them together for increased efficiency of memory reads andwrites. Command clustering in the write queue 130 is simplified comparedto the prior art since all commands in the queue are ready to execute,since there are no RMW commands waiting for data in the write queue 120.Command clustering in the RMW queue 135 is also simplified because it isseparate from the queues dealing directly with execution. Clustering onthe RMW queue 135 can also be done with less interruption of theexecution process since accessing the RMW queue 135 can be done inparallel with execution occurring in the other queues. Clustering andoptimization of RMW commands can also be accomplished as describedbelow.

Again referring to FIG. 1, the memory controller 100 can performoptimizations of commands on the RMW queue 135. The memory controller100 first attempts to combine RMW commands on the RMW queue 135. Thememory controller 100 looks for RMW queue entries that are to the samecacheline. The memory controller 100 can combine entries on the RMWqueue that are to the same cacheline. This combination can be donebefore the read or after the read of the data for the RMW commands. Ifthe merged entries accumulate to a full cacheline, then any reads thatmay have been sent to the read queue can be cancelled. In anotheroptimization, the memory controller 100 looks for RMW queue entries thatare to the same cacheline as a write on the write queue 130. Since dataon the write queue 130 is to a full cacheline, the memory controller 100can combines entries on the RMW queue 135 that are to the same cachelineas the writes on the write queue 130 without performing a read of thedata.

The timing diagram of FIG. 2 illustrates the timing according to thepreferred embodiments. FIG. 2. also readily shows the difference intiming when compared with the prior art timing in FIG. 11. The activityon the memory controller is shown under the heading “Memory BusOperation” compared with the timing of a “Memory Bus Clock.” A first RMWcycle is designated as RMW0. The RMW0 cycle has a read command 210 and awrite command 220. The time between the read command 210 and a writecommand 220 is a RMW time 230. In contrast to the prior art memorycontrollers, the time 230 between the read portion of the RMW command210 and the write portion 220 includes other access commands to thememory. In FIG. 2 the read portion of other RMW commands (RMW1, RMW2,and RMW3) are shown to be executed between the read and write of theRMW0 command. Note, however, because the read command portion of a RMWcommand appears the same as a processor read command on the read queue120, the read cycles labeled RMW1, RMW2 and RMW3 in FIG. 2 could alsorepresent processor reads as well.

FIG. 3 illustrates a flow diagram of a method 300 for processing RMWoperations in accordance with the preferred embodiments. Method 300shows the logic of the memory controller 100 to translate the atomicread/modify/write operation into separate read and write operations asdescribed above. Method 300 is the initial part of the logic forprocessing incoming commands to the memory controller 100. Uponreceiving a new command, the controller checks if the command is a readcommand (step 310). If the command is a read command (step 310=yes) thenthe command is put on the read queue (step 320). If the command is not aread command (step 310=no) then the controller checks if it is a writecommand (step 330). If the command is a write command (step 330=yes)then the command is put on the write queue (step 340). If the command isnot a write command (step 330=no) then the command must be a RMW commandand the controller puts the command on the RMW queue (step 350).

FIG. 4 illustrates a flow diagram of a method 400 for processing RMWoperations in accordance with the preferred embodiments. Method 400shows the logic of the memory controller 100 to execute a RMW command onthe RMW queue to translate the RMW command into separate read and writeoperations as described above. The controller first writes the readportion of the RMW command in the RMW queue to the read queue (step410). The controller then waits for data from the read command (step420) to be returned from the read portion of the RMW command that wasplaced on the read queue and executed from the read queue. Thecontroller then combines data returned from the read queue with thepartial RMW data of the original RMW command into a single write commandand places the write command on the write queue (step 430).

FIG. 5 illustrates a flow diagram of a method 500 for processing RMWoperations in accordance with the preferred embodiments. Method 500shows the logic of the memory controller 100 to combine RMW commands onthe RMW queue. The controller first looks for RMW queue entries that areto the same cacheline (step 510). The controller combines entries to thesame cacheline on the RMW queue (step 520). The controller then looksfor RMW queue entries that are to the same cacheline as a write on thewrite queue (step 530). The controller combines the RMW command on theRMW queue and the write command on the write queue into the writecommand on the write queue (step 540).

Referring to FIG. 6, another memory controller 600 in accordance withthe preferred embodiments is shown. The features and operation of memorycontroller 600 are similar to those described above with reference toFIG. 1. However, in this embodiment, the RMW commands are placed in thewrite queue 630 along with write commands. The memory controller 600includes a read queue 620, a write queue 630 and command formattinglogic 640. A read command 650 from a processor is written to the readqueue 620. A write command 660 from the processor is written to thewrite queue 630. A RMW command 665 from the processor is also written tothe write queue 630. The memory controller includes a control register635 for each entry location in the write queue 630, or at least thoseentries that are used for RMW commands. The control register 635 mayinclude one or more register bits or flags used by the memory controller600 for executing the RMW command from the write queue 630. The controlregister for the described embodiment includes a RMW flag to indicatethe command is a RMW, and a dependency flag to indicate the command iswaiting for a read command to complete.

FIG. 7 illustrates a flow diagram of a method 700 for processing RMWoperations in accordance with the preferred embodiments related to FIG.6. Method 700 shows the logic of the memory controller 100 to translatethe atomic read/modify/write operation into separate read and writeoperations as described above. Method 700 is the initial part of thelogic for processing incoming commands to the memory controller 600.Upon receiving a new command, the controller checks if the command is aread command (step 710). If the command is a read command (step 710=yes)then the command is put on the read queue (step 720). If the command isnot a read command (step 710=no) then the controller checks if it is awrite command (step 730). If the command is a write command (step730=yes) then the command is put on the write queue (step 740). If thecommand is not a write command (step 730=no) then the command must be aRMW command and the controller puts the command on the write queue andsets a RMW flag or a dependency indicator associated with the command inthe write queue (step 750).

FIG. 8 illustrates a flow diagram of a method 800 for processing RMWoperations in accordance with the preferred embodiments. Method 800shows the logic of the memory controller 600 to execute a RMW command onthe write queue. The controller first writes the read portion of the RMWcommand in the RMW queue to the read queue and sets a dependency flag(step 810). The controller then waits for data from the read command(step 820) to be returned from the read portion of the RMW command thatwas placed on the read queue and executed from the read queue. Thecontroller then combines data returned from the read queue with thepartial RMW data of the original RMW command into a single writeoperation and places the command on the write queue and clears thedependency flag (step 830).

FIG. 9 illustrates a flow diagram of a method 900 for processing RMWoperations in accordance with the preferred embodiments. Method 900shows the logic of the memory controller 600 to combine RMW commands onthe write queue. The controller first looks for RMW queue entries thatare to the same cacheline (step 910). The controller combines entries tothe same cacheline on the write queue (step 920). The controller thenlooks for RMW queue entries that are to the same cacheline as a writecommand on the write queue (step 930). The controller combines the RMWcommand on the write queue and the write command on the write queue intothe write command on the write queue (step 940).

The embodiments described herein provide important improvements over theprior art. The memory controller optimizes RMW commands by breaking theminto separate and unique read and write commands. The embodiments allowthe memory controller to translate RMW commands into read and writecommands with the proper sequence of execution to preserve dataconsistency. The preferred embodiments will provide the computerindustry with increased memory bandwidth during Read/Modify/Write cyclesfor an overall increase in computer system performance.

One skilled in the art will appreciate that many variations are possiblewithin the scope of the present invention. Thus, while the invention hasbeen particularly shown and described with reference to preferredembodiments thereof, it will be understood by those skilled in the artthat these and other changes in form and details may be made thereinwithout departing from the spirit and scope of the invention. Forexample, while the preferred embodiments are discussed herein withparticular regard to DRAMs, the memory controller and methods of thepreferred embodiments may be applied to any semiconductor memoryincluding embedded memory systems.

1) A method for a memory controller to access memory, the methodcomprising the steps of: writing a read command to a read queue; writinga write command to a write queue; writing a read-modify-write (RMW)command to a RMW queue; translating the RMW command on the RMW queueinto a read command on the read queue and a write command on the writequeue; and controlling a sequence of executing the read command and thewrite command. 2) The method of claim 1, further comprising the step ofremoving the RMW command from the RMW queue after receiving results fromexecuting the read command and writing the write command to the writequeue. 3) The method of claim 1, further comprising the step of settinga dependency indicator for the RMW command in the write queue. 4) Themethod of claim 1, wherein a read command on the read queue is executedafter commencing a read portion of a read-modify-write cycle and beforecompletion of the read-modify-write cycle. 5) A method for a memorycontroller to access memory, the method comprising the steps of: writinga read command to a read queue; writing a write command to a writequeue; writing a read-modify-write (RMW) command to a RMW queue; writinga read command portion of the RMW command to the read queue; waiting fordata from an executing read command portion of the RMW command;combining the data from the read command portion of the RMW command withpartial data from the RMW command into a single write command; andwriting the single write command to the write queue. 6) The method ofclaim 5 wherein a read command on the read queue may be executed aftercommencing a read portion of a read-modify-write cycle and beforecompletion of the read-modify-write cycle.